-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix table function execution without partitioning (v2) #21558
base: master
Are you sure you want to change the base?
Fix table function execution without partitioning (v2) #21558
Conversation
5b36e66
to
a31bbfe
Compare
failure is related. |
I've tested the fix and it helps with #20398 however when writing the data using table function, it still does not distribute the data evenly and there is no way to control the number of files being written. |
between the nodes? there is no "writer scaling" equivalent for table functions, right?
do you mean control number of TF data processor instances? |
between files, when using
correct
control the number of files, for |
a31bbfe
to
b358662
Compare
( rebased to resolve conflicts, no other changes ) |
This is indeed where the writers are created. But I guess this depends on the shape of the plan right before going into the table function operator? |
Planned changes moved out to #21710 |
e1bf212
to
d762983
Compare
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
d762983
to
fa9fdf0
Compare
During migration from TestNG to JUnit `@TestInstance(PER_CLASS)` annotation was added, but it implies single-threaded execution. Restore previous parallelism: either add `@Execution(CONCURRENT)` or inherit it from base class.
Previously, when table function did not declare partitioning, it would be globally distributed, but on a worker node it would run single-threaded and first buffer all data in memory, like a one big WINDOW. After the change, the local execution processes input pages in a streaming fashion. This commit also fixes property derivations for a case where table function is partitioned on empty list of symbols (global grouping).
fa9fdf0
to
a892820
Compare
pagesIndex, | ||
0, | ||
pagesIndex.getPositionCount(), | ||
new TableFunctionDataProcessor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to wrap in a new anonymous class?
Locally, I directly passed the tableFunction
which also resolved the TestJsonTable
failures for me.
Previously, when table function did not declare partitioning it would run single-threaded and first buffer all data in memory, like a one big WINDOW. After the change, the local execution processes input pages in a streaming fashion.
Fixes #20398
Alternative to #21378, with different
TableFunctionDataProcessor
lifecycle. The implementation creates one TableFunctionDataProcessor per operator for streaming processing.